Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments
نویسندگان
چکیده
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computation and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles.
منابع مشابه
A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملBulk Scheduling with DIANA Scheduler
Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necess...
متن کاملNetwork and Data Location Aware Job Scheduling in Grid: Improvement to GridWay Metascheduler
Grid Computing has enabled us to utilize the unused computing power (CPU cycles) of computers connected to networks (e.g. Internet). Nowadays, there are lots of scientific projects going on in the domain of High Energy Physics (HEP) and Grid infrastructure constitutes the core computing facility of these projects. One such project is LHC (Large Hadron Collider) deployed at CERN. These experimen...
متن کاملA Rank-Based Hybrid Algorithm for Scheduling Data- and Computation-Intensive Jobs in Grid Environments
Scheduling is one of the most important challenges in grid computing environments. Most existing scheduling algorithms in grids only focus on one type of grid jobs which can be data-intensive or computation-intensive. However, merely considering one type of jobs in scheduling does not result in proper scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the ot...
متن کاملLOGOS: Enabling Local Resource Managers for the Efficient Support of Data-Intensive Workflows within Grid Sites
In this study we discuss how to enable grid sites for the support of data-intensive workflows. Usually, within grid sites, tasks and resources are administrated by local resource managers (LRMs). Many of LRMs have been designed for managing compute-intensive applications. Therefore, data-intensive workflow applications might not perform well on such environments due to the number and size of da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0707.0862 شماره
صفحات -
تاریخ انتشار 2007